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Abstract 

This paper describes two new bunsetsu identification 
methods using supervised learning. Since Japanese 
syntactic analysis is usually done after bunsetsu 
identification, bunsetsu identification is important 
for analyzing Japanese sentences. In experiments 
comparing the four previously available machine- 
learning methods (decision tree, maximum-entropy 
method, example-based approach and decision list) 
and two new methods using category-exclusive rules, 
the new method using the category-exclusive rules 
with the highest similarity performed best. 

1 Introduction 

This paper is about machine learning methods for 
identifying bunsetsva, which correspond to English 
phrasal units such as noun phrases and prepositional 
phrases. Since Japanese syntactic an alysis is usu- 
ally done after bunsetsu identification ( Uchimoto et 
al., 1999), identifying bunsetsu is important for an- 



alyzing Japanese sentences. The conventional stud- 
ies on bunsetsu identifi cation^] have used hand-made 
rules ( Kameda, 1995| ; Kurohashi, 1998), but bim- 



setsu identification is not an easy task. Conventional 
studies used many hand-made rules developed at the 
cost of many man-hours. Kurohashi, for example. 



made 146 r ules for bunsetsu identification (Kuro- 



hashi, 1998) 



In an attempt to reduce the number of man- 
hours, we used machine-learning methods for bun- 
setsu identification. Because it was not clear which 
machine- learning method would be the one most ap- 
propriate for bunsetsu identification, so we tried a 
variety of them. In this paper we report exper- 
iments comparing four machine-learning methods 
(decision tree, maximum entropy, example-based, 
and decision list methods) and our new methods us- 
ing category-exclusive rules. 



^Bunsetsu identification is a problem similar to chunk ing 
(Ramshaw and Marcus, 1995; ^ang and Vcenstra, 1999[ ) in 
other languages. 



2 Bunsetsu identification problem 

We conducted experiments on the following super- 
vised learning methods for identifying bunsetsu: 

• Decision tree method 

• Maximum entropy method 

• Example-based method (use of similarity) 

• Decision list (use of probability and frequency) 

• Method 1 (use of exclusive rules) 

• Method 2 (use of exclusive rules with the high- 
est similarity). 

In general, bunsetsu identification is done after 
morphological and before syntactic analysis. Mor- 
phological analysis corresponds to part-of-speech 
tagging in English. Japanese syntactic structures are 
usually represented by the relations between bun- 
setsus, which correspond to phrasal units such as a 
noun phrase or a prepositional phrase in English. 
So, bunsetsu identification is important in Japanese 
sentence analysis. 

In this paper, we identify a bunsetsu by using 
information from a morphological analysis. Bun- 
setsu identification is treated as the task of deciding 
whether to insert a "|" mark to indicate the partition 
between two bunsetsus as in Figure |l|. Therefore, 
bunsetsu identification is done by judging whether a 
partition mark should be inserted between two adja- 
cent morphemes or not. (We do not use the inserted 
partition mark in the following analysis in this paper 
for the sake of simplicity.) 

Our bunsetsu identification method uses the mor- 
phological information of the two preceding and two 
succeeding morphemes of an analyzed space between 
two adjacent morphemes. We use the following mor- 
phological information: 

(i) Major part-of-speech (POS) category,^ 

(ii) Minor POS category or inflection type, 

(iii) Semantic information (the first three-digit num- 
ber of a cat egory number as used in "BGH" 
(INLRI, 1964D ), 



^ Part-nf-speech cate uories follow those of JUMAN (Kuro- 
hashi and Nagao, 1991 
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boku ga \ bunsetsu wo \ matomeageru 

(I) nominative-case particle (bunsetsu) objective-case particle (identify) 

(I identify bunsetsu.) 



Figure 1 : Example of identified bunsetsus 
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Figure 2: Information used in bunsetsu identification 



(iv) Word (lexical information). 

For simplicity we do not use the "Semantic infor- 
mation" and "Word" in either of the two outside 
morphemes. 

Figure ^ shows the information used to judge 
whether or not to insert a partition mark in the space 
between two adjacent morphemes, "wo (obj)" and 
"kugiru (divide)," in the sentence "bun wo kugiru. 
((I) divide sentences)." 

3 Bunsetsu identification process for 
each machine- learning method 

3.1 Decision-tree method 



In th is work we used the program C4.5 ( Quinlan _ 
1995| ) for the decision-tree learning method. The 
four types of information, (i) major POS, (ii) mi- 
nor POS, (iii) semantic information, and (iv) word, 
mentioned in the previous section were also used 
as features with the decision-tree learning method. 
As shown in Figure |^, the number of features is 12 
(2 -I- 4 -I- 4 -I- 2) because we do not use (iii) semantic 
information and (iv) word information from the two 
outside morphemes. 

In Figure |^, for example, the value of the feature 
'the major POS of the far left morpheme' is 'Noun.' 

3.2 Maximum-entropy method 

The maximum-entropy method is useful with sparse 
data cond itions and has been used by many re- 
searchers (Berger ct al., 199C ; Ratnaparkhi, 199(: 



Ratnaparkhi, 1997; Borthwick et al., 199g ; Uchi 



moto et al., 1999). In our maxi mum-entropy cxper- 
iment we used Ristad's system (Ristad, 199S). The 
analysis is performed by calculating the probability 
of inserting or not inserting a partition mark, from 
the output of the system. Whichever probability is 
higher is selected as the desired answer. 



In the maximum-entropy method, we use the same 
four types of morphological information, (i) major 
POS, (ii) minor POS, (iii) semantic information, and 
(iv) word, as in the decision-tree method. However, 
it does not consider a combination of features. Un- 
like the decision-tree method, as a result we had to 
combine features manually. 

First we considered a combination of the bits of 
each morphological information. Because there were 
four types of information, the total number of com- 
binations was 2^ — 1. Since this number is large 
and intractable, we considered that (i) major POS, 
(ii) minor POS, (iii) semantic information, and (iv) 
word information gradually become more specific in 
this order, and we combined the four types of infor- 
mation in the following way: 

Information A: (i) major POS 

Information B: (i) major POS and (ii) minor POS 

Information C: (i) major POS, (ii) minor POS and 

(iii) semantic information 
Information D: (i) major POS, (ii) minor POS, 

(iii) semantic information and (iv) word 

(1) 

We used only Information A and B for the two out- 
side morphemes because we did not use semantic 
and word information in the same way it is used in 
the decision-tree method. 

Next, we considered the combinations of each type 
of information. As shown in Figure ^, the number 
of combinations was 64 (2 x 4 x 4 x 2). 

For data sparseness, in addition to the above com- 
binations, we considered the cases in which first, one 
of the two outside morphemes was not used, sec- 
ondly, neither of the two outside ones were used, and 
thirdly, only one of the two middle ones is used. The 
number of features used in the maximum-entropy 
method is 152, which is obtained as follows]^ 



^ When we extracted features from all of the articles on 
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Far left morpheme Left morpheme Right morpheme Far right morpheme 

{Major POS ( Major POS 

Minor POS II Minor POS I ^ f Major POS 1 
Semantic Information | | Semantic Information | | Minor POS J 
Word J [ Word J 

2 4 4 2 

Figure 3: Features used in the decision-tree method 



Far left morpheme 

J Information A 
1 Information B 



& 



Left morpheme 

{Information 
Information B 
Information C 
Information 
4 



Right morpheme 

{Information A' 
Information B 
Information C 
Information D 
4 



Far right morpheme 



J Information A 
1 Information B 



Figure 4: Features used in the maximum-entropy method. 



No. of features^ 2 x 4 x 4 x 2 
+ 2 X 4 X 4 
+ 4x4x2 
+ 4x4 
+ 4 
+ 4 
= 152 

In Figure ^, the feature that uses Information 
B in the far left morpheme, Information D in the 
left morpheme, Information C in the right mor- 
pheme, and Information A in the far right mor- 
pheme is "Noun: Normal Noun; Particle: Case- 
Particle: none: wo; Verb: Normal Form: 217; Sym- 
bol" . In the maximum-entropy method we used for 
each space 152 features such as this one. 

3.3 Example-based method (use of 
similarity) 

An example-b ased method was proposed by Nagao 
(Nagao, 1984) in an attempt to solve problems in 
machine translation. To resolve a problem, it uses 
the most similar example. In the present work, the 
example-based method impartially used the same 
four types of information (see Eq. (1)) as used in 
the maximum-entropy method. 

To use this method, we must define the similarity 
of an input to an example. We use the 152 patterns 
from the maximum-entropy method to establish the 
level of similarity. We define the similarity S be- 
tween an input and an example according to which 
one of these 152 levels is the matching level, as fol- 
lows. (The equation reflects the importance of the 
two middle morphemes.) 



S = s(m_i) x s(to+i) x 10,000 
-I- s(m_2) X s(m+2) 



(2) 



January 1, 1995 of a Kyoto University corpus (the number of 
spaces between morphemes was 25,814) by using this method, 
the number of types of features was 1,534,701. 



Here to_i, m+i, m_2, and to+2 refer respectively to 
the left, right, far left, and far right morphemes, and 
s(x) is the morphological similarity of a morpheme 
X, which is defined as follows: 

s(x) =1 (when no information of x is matched) 

2 (when Information A of x is matched) 

3 (when Information B of x is matched) 

4 (when Information C of x is matched) 

5 (when Information D of x is matched) 

.(3) 

Figure H shows an example of the levels of sim- 
ilarity. When a pattern matches Information A of 
all four morphemes, such as "Noun; Particle; Verb; 
Symbol", its similarity is 40,004 (2 x 2 x 10,000 + 
2x2). When a pattern matches a pattern, such as 
" — ; Particle: Case-Particle: none: wo; — ; — ", its 
similarity is 50,001 (5 x 1 x 10, 000 + 1 x 1). 

The example-based method extracts the exam- 
ple with the highest level of similarity and checks 
whether or not that example is marked. A partition 
mark is inserted in the input data only when the ex- 
ample is marked. When multiple examples have the 
same highest level of similarity, the selection of the 
best example is ambiguous. In this case, we count 
the number of marked and unmarked spaces in all 
of the examples and choose the larger. 

3.4 Decision-list method (use of probability 
and frequency) 

The decision- list method was proposed by Rivest 
(Rivest, 1987), in which the rules are not expressed 
as a tree structure like in the decision-tree method. 
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Figure 5: Example of levels of similarity 



but are expanded by combining all the features, and 
are stored in a one-dimensional list. A priority or- 
der is defined in a certain way and all of the rules 
are arranged in this order. The decision-list method 
searches for rules from the top of the list and an- 
alyzes a particular problem by using only the first 
applicable rule. 

In this study we used in the decision-list method 
the same 152 types of patterns that were used in the 
maximum-entropy method. 

To determine the priority or der of the rules, w e re- 
ferred to Yarowsky's me thod (Yarowsky, 1994) and 
Nishiokayama's method ( Nishiokayama et al., 199^ ) 
and used the probability and frequency of each rule 
as measures of this priority order. When multiple 
rules had the same probability, the rules were ar- 
ranged in order of their frequency. 

Suppose, for example, that Pattern A "Noun: 
Normal Noun; Particle: Case-Particle: none: wo; 
Verb: Normal Form: 217; Symbol: Punctuation" 
occurs 13 times in a learning set and that ten of 
the occurrences include the inserted partition mark. 
Suppose also that Pattern B "Noun; Particle; Verb; 
Symbol" occurs 123 times in a learning set and that 
90 of the occurrences include the mark. 

This example is recognized by the following rules: 

Pattern A ^ Partition 76.9% (10/ 13), Freq. 23 
Pattern B ^ Partition 73.2% (90/123), Freq. 123 

Many similar rules were made and were then listed 
in order of their probabilities and, for any one prob- 
ability, in order of their frequencies. This list was 
searched from the top and the answer was obtained 
by using the first applicable rule. 

3.5 Method 1 (use of category-exclusive 
rules) 

So far, we have described the four existing machine 
learning methods. In the next two sections we de- 
scribe our methods. 

It is reasonable to consider the 152 patterns used 
in three of the previous methods. Now, let us sup- 
pose that the 152 patterns from the learning set yield 
the statistics of Figure |[ 



"Partition" means that the rule determines that a 
partition mark should be inserted in the input data 
and "non-partition" means that the rule determines 
that a partition mark should not be inserted. 

Suppose that when we solve a hypothetical prob- 
lem Patterns A to G are applicable. If we use the 
decision-list method, only Rule A is used, which is 
applied first, and this determines that a partition 
mark should not be inserted. For Rules B, C, and 
D, although the frequency of each rule is lower than 
that of Rule A, the sum of their frequencies of the 
rules is higher, so we think that it is better to use 
Rules B, C, and D than Rule A. Method 1 follows 
this idea, but we do not simply sum up the frequen- 
cies. Instead, we count the number of examples used 
in Rules B, C, and D and judge the category having 
the largest number of examples that satisfy the pat- 
tern with the highest probability to be the desired 
answer. 

For example, suppose that in the above example 
the number of examples satisfying Rules B, C, and 
D is 65. (Because some examples overlap in multi- 
ple rules, the total number of examples is actually 
smaller than the total number of the frequencies of 
the three rules.) In this case, among the examples 
used by the rules having 100% probability, the num- 
ber of examples of partition is 65, and the number 
of examples of non-partition is 34. So, we determine 
that the desired answer is to partition. 

A rule having 100% probability is called a 
category- exclusive rule because all the data satisfy- 
ing it belong to one category, which is either parti- 
tion or non-partition. Because for any given space 
the number of rules used can be as large as 152, 
category-exclusive rules are applied oftery. Method 
1 uses all of these category-exclusive rules, so we call 
it the method using category-exclusive rules. 

Solving problems by using rules whose probabili- 
ties are not 100% may result in the wrong solutions. 
Almost all of the traditional machine learning meth- 
ods solve problems by using rules whose probabilities 



* The ratio of the spaces analyzed by using category- 
exclusive rules is 99.30% (16864/16983) in Experiment 1 of 
Section kl This indicates that almost all of the spaces are 
analyzecTby category-exclusive rules. 
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Rule A: Pattern A => probability of non-partition 

Rule B; Pattern B => probability of partition 

Rule C: Pattern C => probability of partition 

Rule D: Pattern D => probability of partition 

Rule E: Pattern E probability of partition 

Rule F: Pattern F probability of partition 

Rule G: Pattern G => probability of non-partition 



100% ( 34/ 34) 
100% ( 33/ 33) 
100% ( 25/ 25) 
100% ( 19/ 19) 
81.3% (100/123) 
76.9% ( 10/ 13) 
57.4% (310/540) 



Frequency 34 
Frequency 33 
Frequency 25 
Frequency 19 
Frequency 123 
Frequency 13 
Frequency 540 



Figure 6: an example of rules used in Method 1 



are not 100%. By using such methods, we cannot 
hope to improve accuracy. If we want to improve ac- 
curacy, we must use category-exclusive rules. There 
are some cases, however, for which, even if we take 
this approach, category-exclusive rules are rarely ap- 
plied. In such cases, we must add new features to 
the analysis to create a situation in which many 
category-exclusive rules can be applied. 

However, it is not sufficient to use category- 
exclusive rules. There are many meaningless rules 
which happen to be category-exclusive only in a 
learning set. We must consider how to eliminate 
such meaningless rules. 

3.6 Method 2 (using category-exclusive 
rules with the highest similarity) 

Method 2 combines the example-based method and 
Method 1. That is, it combines the method using 
similarity and the method using category-exclusive 
rules in order to eliminate the meaningless category- 
exclusive rules mentioned in the previous section. 

Method 2 also uses 152 patterns for identifying 
bunsetsu. These patterns are used as rules in the 
same way as in Method 1. Desired answers are deter- 
mined by using the rule having the highest probabil- 
ity. When multiple rules have the same probability. 
Method 2 uses the value of the similarity described 
in the section of the example-based method and an- 
alyzes the problem with the rule having the highest 
similarity. When multiple rules have the same prob- 
ability and similarity, the method takes the exam- 
ples used by the rules having the highest probability 
and the highest similarity, and chooses the category 
with the larger number of examples as the desired 
answer, in the same way as in Method 1. 

However, when category-exclusive rules having 
more than one frequency exist, the above procedure 
is performed after eliminating all of the category- 
exclusive rules having one frequency. In other words, 
category-exclusive rules having more than one fre- 
quency are given a higher priority than category- 
exclusive rules having only one frequency but hav- 
ing a higher similarity. This is because category- 
exclusive rules having only one frequency are not so 
reliable. 



4 Experiments and discussion 

In our e xperiments we used a Kyoto U niversity text 
corpus (Kurohashi and Nagao, 1997), which is a 
tagged corpus made up of articles from the Mainichi 
newspaper. All experiments reported in this paper 
were performed using articles dated from January 
1 to 5, 1995. We obtained the correct information 
on morphology and bunsetsu identification from the 
tagged corpus. 

The following experiments were conducted to de- 
termine which supervised learning method achieves 
the highest accuracy rate. 

• Experiment 1 

Learning set: January 1, 1995 
Test set: January 3, 1995 

• Experiment 2 

Learning set: January 4, 1995 
Test set: January 5, 1995 

Because we used Experiment 1 in making Method 
1 and Method 2, Experiment 1 is a closed data set 
for Method 1 and Method 2. So, we performed Ex- 
periment 2. 

The resul ts are listed in T ables |l| to ^. W e used 



KNP2.0b4 (Kurohashi, 1997) and KNP2.0b6 (Kuro- 



hashi, 1998), which are bunsetsu identification and 



syntactic analysis systems using many hand-made 
rules in addition to the six methods described in 
Section ^. Because KNP is not based on a machine 
learning method but many hand-made rules, in the 
KNP results "Learning set" and "Test set" in the ta- 
bles have no meanings. In the experiment of KNP, 
we also uses morphological information in a corpus. 
The "F" in the tables indicates the F-measure, which 
is the harmonic mean of a recall and a precision. A 
recall is the fraction of correctly identified partitions 
out of all the partitions. A precision is the frac- 
tion of correctly identified partitions out of all the 
spaces which were judged to have a partition mark 
inserted. 

Tables to || show the following results: 

• In the test set the decision-tree method was 
a little better than the maximum-entropy 
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Table 1 : Results of learning set of Experiment 1 



Table 3: Results of learning set of Experiment 2 



Method 


F 


Recall 


Precision 


Decision Tree 


99.58% 


99.66% 


99.51% 


Maximum Entropy 


99.20% 


99.35% 


99.06% 


Example-Based 


99.98% 


100.00% 


99.97% 


Decision List 


99.98% 


100.00% 


99.97% 


Method 1 


99.98% 


100.00% 


99.97% 


Method 2 


99.98% 


100.00% 


99.97% 


KNP 2.0b4 


99.23% 


99.78% 


98.69% 


KNP 2.0b6 


99.73% 


99.77% 


99.69% 



The number of spaces between two morphemes is 



25,814. The number of partitions is 9,523. 



Method 


F 


Recall 


Precision 


Decision Tree 


99.70% 


99.71% 


99.69% 


Maximum Entropy 


99.07% 


99.23% 


98.92% 


Example-Based 


99.99% 


100.00% 


99.98% 


Decision List 


99.99% 


100.00% 


99.98% 


Method 1 


99.99% 


100.00% 


99.98% 


Method 2 


99.99% 


100.00% 


99.98% 


KNP 2.0b4 


98.94% 


99.50% 


98.39% 


KNP 2.0b6 


99.47% 


99.47% 


99.48% 



The number of spaces between two morphemes is 



27,665. The number of partitions is 10,143. 



Table 2: Results of test set of Experiment 1 



Method 


F 


Recall 


Precision 


Decision Tree 


98.87% 


98.67% 


99.08% 


Maximum Entropy 


98.90% 


98.75% 


99.06% 


Example-Based 


99.02% 


98.69% 


99.36% 


Decision List 


98.95% 


98.43% 


99.48% 


Method 1 


98.98% 


98.54% 


99.43% 


Method 2 


99.16% 


98.88% 


99.45% 


KNP 2.0b4 


99.13% 


99.72% 


98.54% 


KNP 2.0b6 


99.66% 


99.68% 


99.64% 



The number of spaces between two morphemes is 



16,983. The number of partitions is 6,166. 



Table 4: Results of test set of Experiment 2 



Method 


F 


Recall 


Precision 


Decision Tree 


98.50% 


98.51% 


98.49% 


Maximum Entropy 


98.57% 


98.55% 


98.59% 


Example-Based 


98.82% 


98.71% 


98.93% 


Decision List 


98.75% 


98.27% 


99.23% 


Method 1 


98.79% 


98.54% 


99.43% 


Method 2 


98.90% 


98.65% 


99.15% 


KNP 2.0b4 


99.07% 


99.43% 


98.71% 


KNP 2.0b6 


99.51% 


99.40% 


99.61% 



The number of spaces between two morphemes is 



32,304. The number of partitions is 11,756. 



method. Although the maximum-entropy 
method has a weak point in that it does not 
learn the combinations of features, we could 
overcome this weakness by making almost all of 
the combinations of features to produce a higher 
accuracy rate. 

• The decision-list method was better than the 
maximum-entropy method in this experiment. 

• The example-based method obtained the high- 
est accuracy rate among the four existing meth- 
ods. 

• Although Method 1, which uses the category- 
exclusive rule, was worse than the example- 
based method, it was better than the decision- 
list method. One reason for this was that 
the decision-list method chooses rules randomly 
when multiple rules have identical probabilities 
and frequencies. 

• Method 2, which uses the category-exclusive 
rule with the highest similarity, achieved the 
highest accuracy rate among the supervised 
learning methods. 

• The example-based method, the decision-list 
method. Method 1 and Method 2 obtained ac- 
curacy rates of about 100% for the learning set. 
This indicates that these methods are especially 



strong for learning sets. 

• The two methods using similarity (example- 
based method and Method 2) were always bet- 
ter than the other methods, indicating that the 
use of similarity is effective if we can define it 
appropriately. 

• We carried out experiments by using KNP, a 
system that uses many hand-made rules. The 
F-measure of KNP was highest in the test set. 

• We used two versions of KNP, KNP 2.0b4 and 
KNP 2.0b6. The latter was much better than 
the former, indicating that the improvements 
made by hand are effective. But, the mainte- 
nance of rules by hand has a limit, so the im- 
provements made by hand are not always effec- 
tive. 

The above experiments indicate that Method 2 is 
best among the machine learning methods^. 

In Table | we show some cases which were par- 
titioned incorrectly with KNP but correctly with 

^In these experiments, the differences were very small. 
But, we think that the differences are significant to some ex- 
tent because we performed Experiment 1 and Experiment 2, 
the data we used are a large corpus containing about a few 
ten thousand morphemes and tagged objectively in advance, 
and the difference of about 0.1% is large in the precisions of 
99%. 
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Table 5: Cases when KNP was incorrect and Method 
2 was correct 



kotsukotsu j™^'^^ gaman-shi 
(steadily) (be patient with) 
(... be patient with ... steadily) 




yoyuu wo \ motte ^"^'^^ shirizoke 
(enough strength) obj (have) (beat off) 
(... beat oS ... having enough strength) 


kaisha wo | gurupu-wake "'"<-'™<j shite 
company obj (grouping) (do) 
(... do grouping companies) 





Method 2. A partition with "NEED" indicates that 
KNP missed inserting the partition mark, and a par- 
tition with "WRONG" indicates that KNP inserted 
the partition mark incorrectly. In the test set of Ex- 
periment 1, the F-measure of KNP2.0b6 was 99.66%. 
The F-measure increases to 99.83%, under the as- 
sumption that when KNP2.0b6 or Method 2 is cor- 
rect, the answer is correct. Although the accuracy 
rate for KNP2.0b6 was high, there were some cases 
in which KNP partitioned incorrectly and Method 
2 partitioned correctly. A combination of Method 
2 with KNP2.0b6 may be able to improve the F- 
measure. 

The only previous research resolving bunsetsu 
identification by machine learning methods , is the 
work by Zhang ( Zhang and Ozeki, 199^ ). The 
decision-tree method was used in this work. But 
this work used only a small number of infor- 
mation for bunsetsu identification^ and did not 
achieve high accuracy rates. (The recall rate 
was 97.6%(=2502/(2502-|-62)), the precision rate 
was 92.4%(=2502/(2502+205)), and F-measure was 
94.2%.) 

5 Conclusion 

To solve the problem of accurate bunsetsu iden- 
tification, we carried out experiments comparing 
four existing machine-learning methods (decision- 
tree method, maximum-entropy method, example- 
based method and decision-list method). We ob- 
tained the following order of accuracy in bunsetsu 
identification. 

Example-Based > Decision List > 
Maximum Entropy > Decision Tree 

We also described a new method which uses 
category-exclusive rules with the highest similarity. 
This method performed better than the other learn- 
ing methods in our experiments. 
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